Routing by Discriminant Projection : TREC - 4 Kok
نویسندگان
چکیده
We present document routing as a standard problem in discriminant analysis. The standard solution involves the inversion of a large matrix whose dimension is the number of indexed terms. Typically, the solution does not exist because the number of training documents are much smaller compared to the number of terms. We show that one can project this raw document space into a lower dimensional space where solution is possible. Our projection algorithm exploits the characterisitics of the empty space, using only the training documents for eecient coding of the relevance information. Its complexity is linear with respect to the number of terms, and second order with respect to the number of training documents. We can therefore fully exploit the power of discriminant analysis without imposing severe computational and storage constraints.
منابع مشابه
Experiments on Routing, Filtering and Chinese Text Retrieval in TREC-5
We describes our experiments in the routing, ltering and Chinese text retrieval. We based our routing and ltering experiments on our discriminant project algorithm. The algorithm sequentially constructs a series of orthogonal axis from the training documents using the Gram-Schmidt procedure. It then rotates the resulting subspace using principal component analysis so that the axis are ordered b...
متن کاملRouting as Statistical Classi cation
In this paper, we compare learning techniques based on statistical classiication to traditional methods of relevance feedback for the document routing problem. We consider three classiication techniques which have decision rules that are derived via explicit error minimization: linear discriminant analysis, logistic regression , and neural networks. We demonstrate that the classiiers perform 10...
متن کاملTwo-Step Feature Selection and Neural Network Classification for the TREC-8 Routing
At the Caisse des Dépôts et Consignations (CDC), the Agence France-Presse (AFP) news releases are filtered continuously according to the users' interests. Once a user has specified a topic of interest, a filter is customized to fit this user's profile. Until now, these filters would rely on rule-based methods, whose efficiency is proven [Vichot et al., 1999], but which require a large amount of...
متن کاملNew Retrieval Approaches Using SMART: TREC 4
The Smart information retrieval project emphasizes completely automatic approaches to the understanding and retrieval of large quantities of text. We continue our work in TREC 4, performing runs in the routing, ad-hoc, confused text, interactive, and foreign language environments.
متن کاملTwo Steps Feature Selection and Neural Network Classification for the TREC-8 Routing
At the Caisse des Dépôts et Consignations (CDC), the Agence France-Presse (AFP) news releases are filtered continuously according to the users' interests. Once a user has specified a topic of interest, a filter is customized to fit this user's profile. Until now, these filters would rely on rule-based methods, whose efficiency is proven [Vichot et al., 1999], but which require a large amount of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996